Detecting Mental Health Issues Through Social Media Using NLP and Machine Learning

Authors: Aditya Yadav, Adarsh Tiwari , Aman Kumar, Ashwini Kumar Verma

DOI Link: https://doi.org/10.22214/ijraset.2025.69105

Abstract

This study proposes a new framework of AI that socially mines media for effective preemptive detection and diagnosis of mental disorders including, but not limited to, depression, anxiety, bipolar disorder, and schizophrenia. The framework utilizes natural language processing (NLP), machine learning (ML), feature learning via Convolutional Neural Networks (CNN), and classification using XGBoost. Data comprising approximately 200,000 posts was scraped from Reddit and Twitter through the Pushshift API and Tweepy,. Cleansing processes including tokenization, stop-word elimination, and vectorization through TFIDF and Word2Vec were conducted so that the data was ready for analysis. The hybrid CNN-XGBoost model yielded out-standing results achieving 92.3% accuracy for classifying depression posts, 89.7% for anxiety, 86.4% for bipolar disorder, and 81.5% for schizophrenia. These results surpass those obtained from traditional models which SVM (Support Vector Machine) and Random Forest algorithms are based on for classifying performance. Trust and ethical gaps, challenges of the frameworks had, were resolved adding Explanable Artificial Intelligence (XAI) feature integrating SHAP and LIME which greatly enhanced system trust and verifiability. The model was also enhanced by addressing the class imbalance problem with oversampling and the SMOTE technique to enhance the model. Defending dependability claims was placed of the model’s main goal, justifying for contextual underrepresentation. Examples of anticipated practical Inquiries include cross-discipline social media and mental health tracking for mental health monitoring that offers proactive engagement opportunities via early predictive algorithms. The model\'s proactive detection features could alleviate some operational strain placed on the infrastructure of conventional healthcare systems. Additional multilingual clinical validation research will focus on broadening the model’s exploration scope alongside enhancing its practical usability. The development of AI tools has surged, and this particular tool serves as a noteworthy early example of ethically grounded, impactful AI implementation in the realm of mental health care.

Introduction

Mental health disorders are a significant global challenge, often misdiagnosed due to stigma and barriers to care. Early detection is vital, and social media platforms offer rich data that AI can analyze to identify mental health issues like depression and anxiety at an early stage. Research shows that linguistic and visual content on platforms like Twitter, Reddit, and Instagram can reveal emotional distress indicators.

Traditional clinical methods (interviews, self-reporting) are time-consuming and subjective. AI models analyzing social media data provide a faster, objective way to detect symptoms. Integrating AI with social media could enable real-time monitoring, early intervention, and reduce healthcare burdens while also normalizing mental health discussions and reducing stigma.

However, ethical concerns such as privacy, consent, bias, and transparency must be addressed. Techniques like SHAP and LIME help make AI predictions explainable, building trust with users and clinicians.

The study reviews previous work using machine learning and NLP (including deep learning models like CNN and XGBoost) to classify mental health conditions from social media text. It highlights challenges like data imbalance and the need for transparent, ethical AI frameworks.

Using data from Twitter and Reddit, the study developed a hybrid CNN-XGBoost model to detect depression, anxiety, bipolar disorder, and schizophrenia with high accuracy, outperforming traditional models like SVM and Random Forest. Oversampling techniques addressed class imbalance, and explainability methods improved fairness and transparency.

The research aims to provide an ethical, accurate AI-based tool for early mental health detection from social media, improving timely care and supporting clinical decisions.

Conclusion

This research offers an innovative framework for the identification of mental health issues, including depression, anxiety, bipolar disorder, and schizophrenia, through the use of social media data by employing artificial intelligence techniques. The framework achieves remarkable detection rates of 92.3% and 89.7% for depression and anxiety respectively, by incorporating Convolutional Neural Networks (CNNs) for feature extraction and XGBoost for classification. The application of SHAP and LIME features of XAI (Explainable Artificial Intelligence) technology provides ‘transparent’ prediction-making ensuring clinical applicability. Additionally, the use of the Synthetic Minority Oversampling Technique (SMOTE) effectively mitigates class imbalance, enhancing performance in data-scarce conditions such as schizophrenia ([1],[5],[6],[8]). The primary impact of this research is the creation of an Interpretable AI System that offers high classification accuracy as well as a sound reasoning for its forecast. This Te Transparency solves the classicAI Systems issue of “black-boxing”, which erodes AI trustworthiness among clinicians. The research also adheres XAI ethics standards for fairness and accountability on dimensional XAI bias in multi-dimensional space. The hands-on evaluation confirms the practicality of implementing the system on social net-works, which provides opportunities for active mental healthcare delivery at scale. The research findings also enhance possibilities for future work towards integration of multimodal data, multilingual capabilities, and active collaboration with clinicians from different population-based specialties and clinical settings ([2], [5], [9]). This study underscores the potential benefits of AI in mental health care by using social media data for early detection purposes. While AI can provide automated reasoning through expanding algorithms, the validation of the system needs to be focused upon in clinical settings. Future work can find avenues where further clinical testing is possible. Also, expanding resources to include processing other forms of data like images or voice can further enrich the functionalities of the system. Streamlining the system for clinician’s easement can further enrich the usability of the system. The proposed framework has the potential to transform mental health condition management and significantly enhance global mental health care outcomes if these challenges are met. AI primarily supports automation and scalability, which bolsters proactive diagnosis. Addressing ethical concerns fosters trust in the system mitigation and reliance on technologies, making shifting this system one of ai\'s most daunting undertakings.

References

[1] De Choudhury, M., et al. (2013). “Predicting depression via social media.” Proc. Seventh Int. AAAI Conf. Weblogs Soc. Media, Boston, MA, USA, pp. 128–137. [2] Shen, J. H., et al. (2017) “Detecting anxiety through Reddit.” Proc. Fourth Workshop Compute. Linguist. Clin. Psychol., Vancouver, Canada, pp. 58–65. [3] Chen, X., et al. (2023).\"Tweeting your mental health.\" Proc. Hawaii Int. Conf. Syst. Sci. [4] Reece, A. G., et al. (2017). “Instagram photos reveal predictive markers.\" EPJ Data Sci., vol. 6, no. 15, pp. 1–20. [5] Gkotsis, G., et al. (2017). “Characterization of mental health conditions.\" Sci. Rep., vol. 7, pp. 451–461. [6] Halim, Z., et al. (2020).\"Driving-induced stress detection.\" Inf. Fusion. [7] Mansoor, M. A., & Ansari, K. H. (2024).\"Early Detection of Mental Health Crises through AI-Powered Social Media Analysis.\" J. Pers. Med., vol. 14, pp. 1–15. [8] Push shift API \" Tool for collecting tweets for NLP research.\" https://github.com/pushshift/api. [9] Tweepy API \" Reddit data collection and analysis for mental health detection.\" https://www.tweepy.org/. [10] Hugging Face Models \"Pre-trained NLP models for text classification tasks like BERT and Roberta.\" https://huggingface.co/models. [11] TensorFlow and PyTorchFrameworks for building and training machine learning models. TensorFlow | PyTorch. [12] Google Colab\"Free cloud-based platform for implementing and testing ML models.\" https://colab.research.google.com/. [13] E-Risk Dataset (CLEF Initiative)\"Early risk prediction of mental health issues like depression.\" http://early.irlab.org/. [14] Go Emotion Dataset (Google AI)\"Emotion dataset with mental health-related labels.\" https://ai.google/tools/datasets/goemotions/. [15] Sentiment140\"Twitter dataset for sentiment analysis, useful for mental health studies.\" http://help.sentiment140.com/. [16] Reddit Mental Health Datasets “Posts from subreddits like r/depression and r/anxiety.\" https://pushshift.io/. [17] CL Psych Shared Tasks “Challenges and datasets for computational linguistics and clinical psychology.\" https://clpsych.org/. [18] SHAP (SHapley Additive ExPlanations)\"Tool for model interpretability in AI research.\" https://shap.readthedocs.io/. [19] LIME (Local Interpretable Model-agnostic Explanations)\"A technique for interpreting machine learning models.\" https://lime-ml.readthedocs.io/.

Copyright

Copyright © 2025 Aditya Yadav, Adarsh Tiwari , Aman Kumar, Ashwini Kumar Verma . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET69105

Publish Date : 2025-04-17

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here